ABSTRACT

Gene is a basic component of DNA located in the nucleus of Human cell. Currently data mining technique has huge impact in fields of human genetic science and gene sequence data analysis. Gene sequence analysis is a method of subjecting DNA sequence to systematic methods in order to know the genes character, configuration, nature and characteristics. CBC and MNBC applied to gene sequence data analysis, aims to segregate diseased diabetic genes from a vast stream of DNA gene sequence elements present in group of copious statistical data. This techniques attempts to approve, determine methods and tools for analyzing diseased gene sequences. It also helps in classification and interpretation of results accurately and meaningfully. This study is a combination of supervised and unsupervised machine learning technique for data analysis. The clustering is done by CBC whereas classification done by MNBC techniques. It recognizes gene expressions by framing association rules in accordance with support measure and confidence measure on the input data set.It will extract and filter required data into clusters based on CBC technique thereby drafting association rules. These are then applied on testing dataset to filter required (diseased) gene sequences. Finally MLRC algorithm is applied as classification algorithm to identify class labels of test genes sequences in a big dataset. In medical diagnosis gene data mining techniques through gene discretization models helps to identify various associations between the DNA genes based progressions and inconsistency in disease infections transformations. Above all it overcomes the limitation of existing Support Vector Machine Classification technology which incurs high computational cost and increased iterations

Keywords: - Data mining, Data Analysis, DNA Gene, Gene Sequence, Vector Machine Classification